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Abstract 

Herbivorous insects are among the most successful radiations of life. However, we know little about the processes underpinning the 
evolution of herbivory. We examined the evolution of herbivory in the fly, Scaptomyza flava, whose larvae are leaf miners on species of 
Brassicaceae, including the widely studied reference plant, Arabidopsis thaliana (Arabidopsis). Scaptomyza flava is phylogenetically 
nested within the paraphyletic genus Drosophila, and the whole genome sequences available for 1 2 species of Drosophila facilitated 
phylogenetic analysis and assembly of a transcriptome for 5. flava. A time-calibrated phylogeny indicated that leaf mining in 
Scaptomyza evolved between 6 and 16 million years ago. Feeding assays showed that biosynthesis of glucosinolates, the major 
class of antiherbivore chemical defense compounds in mustard leaves, was upregulated by 5. flava larval feeding. The presence of 
glucosinolates in wild-type (WT) Arabidopsis plants reduced 5. flava larval weight gain and increased egg-adult development time 
relative to flies reared in glucosinolate knockout (GKO) plants. An analysis of gene expression differences in 5-day-old larvae reared on 
WT versus GKO plants showed a total of 341 transcripts that were differentially regulated by glucosinolate uptake in larval 5. flava. 
Of these, approximately a third corresponded to homologs of Drosophila melanogaster genes associated with starvation, dietary 
toxin-, heat-, oxidation-, and aging-related stress. The upregulated transcripts exhibited elevated rates of protein evolution compared 
with unregulated transcripts. The remaining differentially regulated transcripts also contained a higher proportion of novel genes than 
the unregulated transcripts. Thus, the transition to herbivory in Scaptomyza appears to be coupled with the evolution of novel genes 
and the co-option of conserved stress-related genes. 
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Introduction 

Phytophagous insects are among the most evolutionarily suc- 
cessful radiations. Herbivorous taxa comprise almost 50% of 



living insect species and 25% of living metazoan species 
(Ehrlich and Raven 1964; Bernays 1998) and are more diverse 
than their aphytophagous sister lineages (Mitter and Farrell 
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1988; Farrell 1998). Adapting to a herbivorous lifestyle is 
thought to have been a difficult transition for most insects, 
however, due to morphological and physiological challenges 
of consuming living plant tissues that are typically well de- 
fended against insects (Frankel 1959; Ehrlich and Raven 
1964; Southwood 1973). Some antiherbivore defenses, in- 
cluding insecticidal compounds produced by plants, are dra- 
matically induced within hours of herbivore attack and are 
tailored toward specific guilds of insect enemies (Kunitz 
1945; Bostock 2005; Textor and Gershenzon 2009). In addi- 
tion, these compounds are often effective against other nat- 
ural enemies, such as microbes (Fan et al. 2011), and 
vertebrates (Magnanou et al. 2009). 

The genomic changes that accompany a shift from an 
aphytophagous to phytophagous lifestyle are not well under- 
stood. The transition to herbivory has evolved independently 
many times in the insects and frequently involved adaptation 
to a more specialized host plant range. Indeed, most herbiv- 
orous species are specialized in host range and are only able to 
complete development on a limited number of species. Many 
evolutionary changes are likely to be required for a successful 
transition to herbivory, including the ability to find suitable 
host plants, utilize potentially nutritionally imbalanced plant 
tissues, and cope with host plant defenses (Bernays and 
Chapman 1994; Govind et al. 2010). 

To study the transition to herbivory from an evolutionary 
perspective, we have focused on a recently derived herbivo- 
rous species that is closely related to a model genetic species. 
Our long-term goal is to understand how the evolution of 
novel genes and/or the recruitment of existing metabolic or 
signaling pathways have enabled herbivores to adapt to a 
fundamentally new niche. Most phytophagous insects are lep- 
idopterans or coleopterans. Because the evolution of herbivory 
in these lineages is estimated to have occurred in the 
Cretaceous, 145-65 million years ago (Whalley 1977; 
Moran 1989; Labandeira et al. 1994; Bernays 1998; Farrell 
1998), using comparative genomics to dissect the genetic 
bases of these transitions would be difficult. However, herbiv- 
ory has also evolved many times in dipterans (Labandeira 
2003), and the family Drosophilidae includes 12 species with 
published genome sequences and at least two genera 
(Scaptomyza and Scaptodrosophila) with phytophagous 
members (Hackman 1959; Bock and Parsons 1975; Clark 
et al. 2007). These provide the opportunity to analyze more 
recent transitions to herbivory. 

The Scaptomyza lineage is nested in the paraphyletic sub- 
genus Drosophila (O'Grady and Desalle 2008). Most of the 
described drosophilid species (ca., 2,000) are saprophagous, 
feeding externally on microbes growing on decaying vegeta- 
tion (e.g., Drosophila melanogaster), ripe fruits (e.g., D. sechel- 
lia), or fruiting bodies of fungi (e.g., D. recens). Although 
decaying vegetation and fungi can be toxin-rich and likely 
present a barrier to colonization (Jaenike et al. 1983; Frank 
and Fogleman 1 992; Jones 2001 , 2005; Markow and O'Grady 



2005; Dworkin and Jones 2009), it seems unlikely that host 
plants or fungi would mount a defense response against these 
insects. However, drosophilids in the genus Scaptomyza 
(fig. 1) have evolved the ability to feed on living angiosperm 
leaves, which activate inducible defenses in response to feed- 
ing damage (Hackman 1 959; Whiteman et al. 201 1 ). It is likely 
that a variety of preformed and inducible defense compounds 
were novel traits encountered by Scaptomyza and other her- 
bivorous Drosophilidae in their transition from a saprophagous 
to a phytophagous lifestyle. 

The ancestral host plants of the earliest Scaptomyza herbi- 
vores are unknown, but many herbivorous Scaptomyza spe- 
cies are monophagous or oligophagous on plants in the order 
Brassicales (Hackman 1959). The larvae of 5. flava feed on 
Arabidopsis thaliana (hereafter referred to as Arabidopsis) 
and other species of Brassicales in the wild (Chittenden 
1 902; Whiteman et al. 201 1 ) and require living plants to com- 
plete their life cycle (Whiteman etal. 201 1). Because they have 
been found on many members of the Brassicales and on spe- 
cies in a few other plant families (Caryophyllaceae and 
Fabaceae), we consider this species to be oligophagous 
(Martin 2004). We took advantage of the availability of 
Arabidopsis mutants blocking the synthesis of insecticidal 
compounds to test the hypothesis that major changes associ- 
ated with the transition to feeding on living plants can be 
detected at the genomic level in 5. flava. 

We focused our attention on interactions between larval 
5. flava and glucosinolates, which are well characterized func- 
tionally and metabolically in Arabidopsis (Halkier and 
Gershenzon 2006) and which we hypothesized were a likely 
barrier to colonization by ancestors of 5. flava. Glucosinolates 
are a major barrier to insect colonization of Brassicales, includ- 
ing economically important crops such as canola {Brassica 
napus), broccoli, cabbage (B. oleracea), and papaya (Carica 
papaya). Glucosinolates (and related compounds called cama- 
lexins [Hull et al. 2000]) are amino acid-derived thioglucosides 
that are present constitutively in tissues of plants in the 
Brassicaceae and are highly inducible after herbivore attack, 
increasing in concentration in leaves by up to 40-fold (Halkier 
and Gershenzon 2006; Textor and Gershenzon 2009). During 
tissue damage, glucosinolates are hydrolyzed by endogenous 
myrosinases and are transformed into toxic, electrophilic mus- 
tard oils (isothiocyanates) that can damage proteins and DNA 
(Halkier and Gershenzon 2006). 

We used a variety of approaches, including genome-wide 
transcriptional profiling by RNA-seq, to characterize the evo- 
lutionary and functional impact of glucosinolates, a major 
plant antiherbivore defense, on 5. flava. Our results provide 
a first glimpse into how the genomes of insects are shaped 
by the major ecological transition to herbivory from 
nonherbivorous ancestors. We found that new and conserved 
stress-related genes appear to be co-opted by 5. flava in its 
response to host plant defenses. The powerful tools available 
for the genetic model organisms Drosophila and Arabidopsis 
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Fig. 1. — Chronogram of combined nucleotide sequence data set from 9 genes from the 12 Drosophila species with completely sequenced genomes 
and multiple Scaptomyza species (5. flava, S. nigrita, 5. pallida, and 5. [Hemiscaptomyza] sp.). Initial tree was a maximum likelihood partitioned analysis of 
9 genes inferred using RAxML (-In L = -29,522.877). See table 1 for bootstrap support values for nodes. Nodes were constrained to ages estimated from 
Tamura et al. (2004). The two leaf-mining Scaptomyza species are resolved as sister taxa. Scaptomyza (Parascaptomyza) pallida and Scaptomyza 
(Hemiscaptomyza) sp. do not mine leaves of plants and can be reared on Drosophila media, and larvae are associated with decaying vegetation. The 
split between the most recent common ancestor of the two leaf-mining taxa, 5. flava and 5. nigrita (labeled in green), was estimated at 6.31 (±0.63 SD) 
MYBP, and we inferred this to be the most recent date for the evolution of mustard leaf mining. 



should facilitate functional characterization of genes identified 
here as candidates, allowing us to further elucidate the mo- 
lecular changes that accompany the ecological transition to 
herbivory and host plant specialization. 

Materials and Methods 

Phylogenetic Inference and Dating 

To determine the relationships and times of divergence be- 
tween the 12 Drosophila species with completely sequenced 
genomes and 5. flava, we inferred a phylogeny using a codon- 
partitioned, concatenated data set of ~5,000 genes from 
each of the 13 species, relying on genomic data for the 12 
Drosophila species and transcriptome data for 5. flava. To gain 
further insight into the timing of the evolution of herbivory, 
we also sequenced nine genes (seven nuclear and two mito- 
chondrial) from an expanded taxonomic sample of 
leaf-mining and saprophagous Scaptomyza species and 
estimated phylogenetic relationships among these species 
and other drosophilids. In both analyses, we used indepen- 
dent evidence to calibrate nodes and estimate the timing 
of the origins of herbivory, and we evaluated nodes for 



statistical support (supplementary materials and methods, 
Supplementary Material online). 



GUS Histochemical Assay 

To determine whether feeding by 5. flava larvae induces tran- 
scription of genes essential for glucosinolate biosynthesis, 
cyp79£2-promoter-GUS fusion Arabidopsis plants (Millet 
et al. 2010) were grown at 22°C and 50% relative humidity 
with a 16h:8h lightdark cycle. These reporter lines yield a 
qualitative measure of cyp79B2 gene expression. At 32 days 
following germination, a single third instar larva reared on 
Col-0 wild-type (WT) Arabidopsis plants was placed on each 
of two leaves per plant and allowed to feed for 12 h. At the 
same time, two leaves per mechanical control plant were 
crushed using a forceps three times mimicking the amount 
of area consumed by the larva at each time point. Although 
this treatment is unlikely to precisely mimic active feeding by 5. 
flava larvae, we damaged roughly the same leaf area as a larva 
would over a 12-h period. An undamaged control was left 
untouched. Infiltration with GUS substrate solution and tissue 
fixation followed Millet et al. (2010). Images were visualized 
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using a Leica MZ6 dissecting scope with a 0.5 x lens, and 
results were analyzed qualitatively. 

Performance Assays on Wild-Type and Glucosinolate 
Knockout Arabidopsis thaliana Lines 

To determine whether glucosinolates reduce growth rates of 
5. flava, we grew WT (Col-0) Arabidopsis, aliphatic glucosino- 
late mutant {myb28myb29) (aliphatic glucosinolate knock out 
[AGKO]), indolic glucosinolate mutant (cyp79b2cyp79b3) 
(abbreviated IGKO), and an indolic and aliphatic glucosinolate 
mutant (cyp79b2cyp79b3myb28myb29) (glucosinolate knock- 
out [GKO]) plants to 4 weeks of age (see supplementary 
materials and methods, Supplementary Material online, for 
details). We allowed approximately 400 gravid female 
5. flava flies to create feeding punctures (stipples) and lay 
eggs in WT (A/ =60 plants), AGKO (A/ =60 plants), IGKO 
(A/ =60 plants), and GKO (A/ =60 plants) Arabidopsis plants 
for 24 h by randomly arraying five plants of each genotype in a 
cage and repeating this for all 240 plants. For each plant indi- 
vidual, we quantified the number of stipples created by female 
flies and egg number. We compared the intercepts and slopes 
of stipple number versus egg number regression lines across 
the four plant genotypes using an analysis of covariance 
(ANCOVA) approach. After hatching, larvae were allowed to 
feed on leaves for either 3 or 5 days. We compared average 
larval masses across plant genotypes for each cohort (days 3 
and 5) also using an ANCOVA (using the aov function in the 
statistical package R v. 2.1 5.0) to test for effects of crowding 
on larval growth rates as there were multiple larvae per plant. 
A subset of larvae harvested at 5 days post-egg hatching were 
flash frozen in LN 2 vapor and stored at -80° C. A subset of 
these 5-day-old larvae, two pools from two host plant lines 
(WT and GKO), were used for RNA-sequencing on the lllumina 
GAII platform. 

In a separate experiment, the same plant lines were reared 
as above, and at 4 weeks, individuals of Col-0 (A/ = 1 5 plants) 
and GKO (A/= 1 5 plants) were placed with approximately 50 
ovipositing 5. flava females for 24 h. Each plant was then iso- 
lated in a clear acrylic rearing box, placed in randomized 
arrays under lights as above, and watered to keep soil mois- 
ture constant, and larvae were allowed to develop freely. 
Beginning on day 15 post-egg hatch, we recorded the 
number of adult flies that emerged twice each day. 
Development time was calculated as the average number of 
days from egg hatch to adult emergence for a cohort of flies 
that emerged from a single plant. 

Metabolic Profiling of Arabidopsis Lines Used in 
Herbivore Performance and Transcriptional Profiling 
Experiments 

We conducted nontargeted metabolite profiling of polar com- 
pounds (sugars, hydroxy- and amino acids) by fractionation 
and gas chromatography/mass spectrometry. Glucosinolates 



are not detectable by this protocol and were not analyzed 
using other means because the levels in GKO plants have 
been previously characterized and shown to be undetectable 
(Muller et al. 2010). 

We used 4-week-old WT and GKO (myb28myb29cyp79b2- 
cyp79b3) Arabidopsis plants as above, which were grown 
under a 16 h light:8h dark cycle (using Gro-Lux [Sylvania, 
Danvers, MA, USA] bulbs) at 22°C, 50% relative humidity, 
in a Conviron reach-in Adaptis growth chamber at the 
University of Arizona. Five plants of each genotype were 
flash frozen in LN 2 and shipped on dry ice to the Proteomics 
and Metabolomics Facility at Colorado State University (http:// 
www.pmf.colostate.edu/) for extraction and GC-MS analysis 
(details given in the supplementary materials and methods, 
Supplementary Material online). 

454 Titanium GSFLX Transcriptome Sequencing 

RNA Preparation, cDNA Synthesis, and Normalization 

In 2008, we froze under LN 2 vapor fresh eggs, larvae, pupae, 
and adults of 5. flava from a population cage of flies derived 
from larvae collected from leaves of Barbarea vulgaris 
(Brassicaceae) in Belmont, MA, and subsequently reared on 
Arabidopsis Col-0 and Tsu-0 accessions. A subset of the adults 
was septically injured with cultures of Escherichia coli, Serattia 
marcescens, Pseudomonas syringae, or Staphylococcus aureus 
to maximize gene expression from as many loci as possible (De 
Gregorio et al. 2001 ; Lemaitre and Hoffmann 2007). Details of 
the RNA preparation, cDNA synthesis, and normalization can 
be found in the supplementary materials and methods, 
Supplementary Material online. 

Library Preparation (DNA Processing) for 454 GS FLX 

We followed previously published protocols to prepare 
the 454 GS FLX library (Toth et al. 2007). cDNAs were nebu- 
lized and size selected for an average size of 400-800 bp. 
454 GS FLX Titanium-specific adapters, AdapterA and 
AdapterB, were ligated to the cDNA ends after end polishing. 
The adapter-ligated DNAs were then mobilized to the library 
preparation beads, and sst-cDNAs were captured. The 
number of molecules of the sst-cDNAs was calculated using 
the concentration and average fragment length for emPCR. 

454 GSFLX Sequencing 

We sequenced the library on a 454 GSFLX flow cell using 
titanium chemistry at the University of Illinois. The raw reads 
were processed as described later and combined with the 
lllumina-generated data to create a grand assembly. 

lllumina GAII 72 bp Transcriptome Sequencing 
RNA Preparation 

Larvae harvested 5 days post-egg hatch were placed in tubes 
directly after harvesting from plant leaves, and masses were 
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obtained for performance studies described earlier. Four larval 
pools were obtained, two pools from Col-0 plants (pool 1, 
A/=20 larvae; pool 2, A/= 18 larvae) and two from GKO 
plants (pool 3, N= 12 larvae; pool 4, N= 12 larvae). Larvae 
were flash frozen in LN 2 and then crushed in a Qiagen Tissue 
Lyser (Retsch Mixer Mill) and homogenized with 500 \i\ TRI 
REAGENT per sample, generally following the RNA extraction 
protocol above (for details see supplementary materials and 
methods, Supplementary Material online). Genomic DNA was 
removed using a gDNA Eliminator spin column (Qiagen, Inc.). 
A sample of RNA from each of the four RNA pools was run on 
an Agilent 2100 Bioanalyzer for (RNA Nano Chip) to deter- 
mine the integrity of the RNA samples. 

mRNA Isolation, cDNA Synthesis, and Library Preparation 
(DNA Processing) for lllumina Sequencing 

The lllumina mRNA sequencing sample prep kit was used to 
prepare libraries for each of the four RNA pools, resulting in 
four RNA-seq libraries that were subjected to clustering and 
72 bp sequencing on an lllumina GA II instrument, following 
the manufacturer's protocol. These libraries were not 
normalized. 

454 and lllumina Data Assembly 

Quality Control and Preprocessing of Sequencing Data 

The RNA-seq data consisted of a full plate of 454 sequencing 
(990,223 reads total, from normalized cDNA pools) and four 
lanes of 72 bp lllumina sequencing [73,147,138 reads total, 
from un-normalized cDNA pools from 5-day-old larvae raised 
on either WT or the GKO (cyp79b2cyp79b3myb28myb29) 
Arabidopsis plants]. Quality control and preprocessing details 
can be found in the supplementary materials and methods, 
Supplementary Material online. 

Contig Assembly 

The filtered reads were initially assembled using the software 
program AbySS version 1 .2.1 (Birol et al. 2009; Simpson et al. 
2009) as is described in detail in the supplementary materials 
and methods, Supplementary Material online. 

Homology Assessment and Scaffolding 

To identify the closest homolog of each assembled 5. flava 
contig, we first used blastx to map contigs to proteins from 
D. grimshawi, with an E value cutoff of 1e— 10. We retained 
all hits for which at least 20% of the 5. flava contig hits a 
D. grimshawi protein; we then filtered out all hits with E values 
more than 10 orders of magnitude worse than the best hit. 
We also ran an all-against-all blastn search with an E-value 
cutoff of 1e— 10 and using the 5. flava contigs as both query 
and subject to identify potential paralogs that were not col- 
lapsed by our assembly procedure. For this blastn run, we kept 
all hits, regardless of E value, where the hit covered at least 



90% of the smaller contig. We then identified assembled 
transcript regions (transcripts) as sets of contigs and D. grim- 
shawi proteins that are all connected to each other via blastn 
or blastx hits, using an undirected graph algorithm imple- 
mented in custom Perl scripts. We then generated three dif- 
ferent transcript sets based on our confidence in assignment. 
The high confidence set is based on blastx hits with at least 
85% coverage; the medium confidence set is based on blastx 
hits with between 60% and 85% coverage; and the low con- 
fidence set is based on blastx hits with between 20% and 
60% coverage. In all three sets, we flagged contigs if they 
1) were less than 200 bp and a singleton, 2) had a significant 
blast hit to a non-Drosophila sequence in nr, but not to any 
Drosophila, or 3) had no blast hits with at least the minimum 
20% query coverage. For all confidence sets, we built assem- 
blies by merging adjacent or overlapping 5. flava sequence 
that maps to a given D. grimshawi protein and filling in regions 
of the D. grimshawi protein not sequenced in our cDNA 
libraries as Ns. These merged sequences form the basis of 
the molecular evolutionary analysis (supplementary table S5, 
Supplementary Material online) and differential regula- 
tion analysis described later (supplementary table S6, 
Supplementary Material online). 

By scaffolding contigs and merging potential S. flava para- 
logs identified via all-against-all blastn searches, we collapsed 
the original contigs to assembled transcripts. These transcripts 
were then filtered to remove short contigs and those with 
weak or complex blast hits, which served as the transcriptome 
against which the short-reads from lllumina sequencing of the 
5-day-old larvae from WT and GKO plants were mapped to 
quantify differential regulation of 5. flava genes. 

Quantifying Differential Gene Regulation Using the 
Transcriptome and RNA-Seq 

Here, we used RNA-seq to measure differential expression 
between the transcriptomes of flies raised on WT and GKO 
Arabidopsis. We mapped each of our four filtered lllumina 
RNA-seq samples against the 5. flava (scap) contig set derived 
from all transcriptome data using Bowtie 0.12.5 (Langmead 
et al. 2009) with the following command line parameters: 
-S -n 2 -e 80 -I 25 -nomaqround -y -a -best -strata -m 1 
-solexal .3-quals -sam-nosq -p 2. We were able to uniquely 
map between 37.9% and 43.79% of reads in each sample to 
our filtered contig data set, representing 28.2 million total 
mapped sequences. We then counted the number of reads 
mapped to each contig and summed across all contigs in each 
transcript to generate a count of reads mapped to each tran- 
script in each condition. We tested for differential expression 
of transcripts in larvae reared on the two Arabidopsis lines by 
first summing the reads mapped across all contigs in each 
transcript to generate a count of reads mapped to each 
transcript in each pool, and estimating log 2 -fold change per 
treatment (WT-GKO) using a negative binomial model 
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implemented in the R/Bioconductor package DESeq (Anders 
and Huber 2010). 

Estimating Rates of Molecular Evolution 

The ratio of nonsynonymous substitutions per nonsynon- 
ymous site normalized by the number of synonymous substi- 
tutions per synonymous site (c/ N /c/ s ) was calculated to estimate 
rates of molecular evolution in 5. flava and across the subge- 
nus Drosophila, within which the genus Scaptomyza is nested. 
First, we identified all one-to-one homologs present in S. flava 
and the three species in the subgenus Drosophila with se- 
quenced genomes: D. grimshawi, Drosophila virilis, and 
Drosophila mojavensis (details provided in the supplementary 
materials and methods, Supplementary Material online). 

To estimate d^/d s for each alignment, we used a maximum 
likelihood framework implemented in the codeml module of 
PAML version 4 (Yang 2007). The unrooted input tree 
(D. virilis, D. mojavensis, (D. grimshawi, S. flava #)) was used 
with the two-ratio branch model, which estimates one d N /d s 
in a specified foreground branch (5. flava in this case) and a 
separate c/ N /c/ s across the rest of the tree. As a quality control, 
we excluded from analysis all homolog sets where greater 
than 5% of sites were variable in the 5. flava contig or 
where d s in S. flava was greater than 0.81 58 (two times the 
median d s across the full 5. flava data set), which may repre- 
sent contig misassembly or sequence misalignment, respec- 
tively. We also excluded sets where d s in 5. flava was less than 
0.01 or less than 70 codons were present without gaps in all 
four species to avoid unreliable estimates of d N /c/ s due to in- 
sufficient data. Results of all these analyses are reported in 
supplementary table S6, Supplementary Material online. 

Filtering Developmentally Regulated Genes 

We mined a high-throughput gene expression data set con- 
structed using D. melanogaster, in which Graveley et al. 
(2011) used lllumina RNA-seq to quantify gene expression 
across 30 developmental stages in isogenic (y 1 ; cn bw 1 sp 1 ) 
D. melanogaster flies. Raw expression values expressed as 
reads per kilobase per million reads mapped (Graveley et al. 
201 1 ) were obtained from the analysis of Gelbart and Emmert 
(2010), which intersected coverage expression data with 
FlyBase exons. 

We excluded data from D. melanogaster genes that had 
been split into multiple separate gene models as of 5 
September 2011. A gene was flagged as developmentally 
confounded if the log 2 (fold change) between 1) second 
instar larvae and larvae 12 h into third instar or 2) 12-h 
third instar larvae and larvae that had just entered the third 
instar puffstage, was greater than 50% of the log 2 (fold ex- 
pression change) between GKO- and WT-fed larvae in our 
experiment. 



Results 

Timing of the Evolution of Herbivory and Host 
Specialization 

The date uniting the Scaptomyza + Hawaiian Drosophila using 
nine gene fragments (29.76 ±2.25 millions of years before 
present [MYBP]) was very close to the one estimated for this 
node using 5. flava transcriptomic data and genomic data for 
the 12 sequenced Drosophila species (27.06 ±0.66 MYBP) 
(table 1). The divergence time for the two herbivorous taxa 
included in this analysis (5. flava and 5. nigrita) was estimated 
to be 6.31 (± 0.63) MYBP, and the divergence time between 
this clade and S. pallida, which is not herbivorous, was esti- 
mated to be 14.70 (±1 .26) MYBP. These dates give a conser- 
vative estimate of the range (6-16 MYBP) within which 
herbivory and mustard specialization likely evolved. 

Phenotypic Consequences of Arabidopsis Antiherbivore 
Defenses on 5. flava 

We first tested whether herbivory by S. flava larvae activated 
the glucosinolate biosynthetic pathway in Arabidopsis. 
We qualitatively assayed transcriptional activity of the 
Arabidopsis cyp79b2 gene in response to feeding by third 
instar 5. flava larvae that were placed on adult Arabidopsis 
cyp79b2promoter.p-glucuronidase (GUS) reporter plants and 
allowed to form mines for 1 2 h. Arabidopsis cyp79b2 encodes 
a cytochrome P450 enzyme required for indolic glucosinolate 
biosynthesis (Hull et al. 2000; Muller et al. 2010). Although 
there was only modest transcriptional (GUS) activity in leaves 
that were mechanically wounded (repeatedly) with stainless 
steel forceps, the Arabidopsis cyp79b2 promoter drove strong 
transcriptional induction in leaf tissue surrounding actively 
feeding larvae (fig. 2). Little transcriptional activity was ob- 
served in unwounded control leaves (fig. 2). 

We next tested whether glucosinolates are involved in re- 
sistance against 5. flava by rearing larvae from birth on WT 
Arabidopsis or three Arabidopsis glucosinolate mutant lines 
(AGKO, IGKO, and GKO lines). We first exposed 60 plants 
of each of the four genotypes to 5. flava females. The main 
purpose of this was simply to allow eggs to hatch in plants for 
the larval performance and transcriptional profiling experi- 
ments. However, we used it as an opportunity to test 
whether female flies varied in their feeding or oviposition pref- 
erences across the plant lines. We regressed egg number on 
stipple number and stipple number explained 54%-66% of 
the variation in egg number across the four host plant lines 
(supplementary fig. S2, Supplementary Material online). 
Although the slopes of these regression lines did not differ 
significantly among host plant lines (ANCOVA, F= 0.8844, 
P=0.45), the intercepts were significantly lower for GKO 
(6=1.19, t= -2.632, P= 0.0091) and AGKO (6 = -1.66, 
t= -2.656, P= 0.0084) plants, indicating that these geno- 
types, which both are lacking aliphatic glucosinolates, were 
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Table 1 

Ages of Nodes in Figure 1 and Supplementary Figure S1, Supplementary Material Online, Inferred by r8s Analysis 



Node 


Calibration Range (MYBP) 


Transcriptome Alignment, 
AgeiSD (MYBP) 


Multiple Scaptomyza spp. 
Age ± SD (MYBP) Bootstrap Support 


1 


50.5-75.3 


74.81 ±1.14 


71.19±4.10 




2 




62.55 ±0.74 


61. 44 ±3.90 


100 


3 




48.85 ±0.97 


50.72 ±3.01 


100 


4 


35.3-53.1 


35.34 ±0.73 


36.75 ±1.97 


100 


5 




8.80 ±1.38 


11. 40 ±0.84 


100 


6 




6.48 ±0.80 


8.96 ±0.78 


57 


7 




4.50 ±0.70 


6.85 ±0.62 


100 


8 




2.31 ±0.52 


3.25 ±0.40 


100 


9 


0.56-1.14 


1.14±0.05 


1.14±0.00 


100 


10 


34.2-51.6 


43.26 ±0.74 


41 .70 ±2.92 


100 


11 




32.04 ±0.66 


31.82 ±2.57 


34 


12 


23.9-37.1 


27.06 ±0.66 


29.76 ±2.25 


100 


13 






21. 69 ±1.76 


100 


14 






14.70 ±1.26 


100 


15 






6.31 ±0.63 


100 



Note. — Node numbers refer to nodes labeled in figure 1 and supplementary figure S1, Supplementary Material online. Date intervals for calibrating are in the second 
column. Divergence dates are in millions of years before present (MYBP) followed by standard deviation of dates estimated by profiling nodes across 1,000 bootstrapped 
replicates. Two phylogenies were used: the first used characters from an alignment of nucleotide sequences from the 12 Drosophila genomes and Scaptomyza flava 
transcriptome, and the second used 9 genes from a broader sampling of drosophilids, including herbivorous and nonherbivorous Scaptomyza species. All bootstrap values for 
the tree generated for transcriptome alignment were 100. 




Fig. 2. — The indolic glucosinolate biosynthetic pathway is induced by 5. flava larval feeding. Activation of the indolic glucosinolate biosynthetic cyp79b2 
gene in the Arabidopsis thaliana promoter.GUS reporter line when attacked by S. flava larvae for 12 h. Strong induction occurs within and around larval 
mines (top right) but less so when leaves are mechanically wounded (bottom left) and induction appears absent in untouched leaves (top left). 
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more heavily attacked by female flies than WT or IGKO plants 
(supplementary fig. S2, Supplementary Material online). 

Using these same plants, 3 days after egg hatch, we ex- 
tracted 31 1 larvae from WT (N= 18 plants), 349 larvae from 
AGKO plants (A/= 18 plants), 285 larvae from IGKO plants 
(A/= 18 plants), and 246 larvae from GKO (A/= 18 plants) 
plants and recorded fresh larval mass. Five days after egg 
hatch, we extracted 351 larvae from WT (A/=36 plants), 
364 larvae from AGKO plants (A/=36 plants), 319 larvae 
from IGKO plants (A/=33 plants), and 280 larvae from GKO 
(A/= 34 plants) plants and recorded fresh larval mass as above. 
Results from the ANCOVA across all four host plant genotypes 
are presented in supplementary figure S3, Supplementary 
Material online. The number of larvae per plant was not a 
significant covariate with plant genotype (ANCOVA, plant ge- 
notype x number of larvae per plant, F 164 = 0.777, P=0.51). 
In an additive model, total number of larvae in each plant 
individual was not a significant factor on its own across geno- 
types (plant genotype + number of larvae per plant, day 3: 
F 1f67 = 2.365, P=0.129; day 5: F 1f135 = 0.842, P=0.36). 
However, plant genotype alone remained a significant factor 
in explaining average larval mass (F 667 = 6.767, P= 0.0005; 
day 5: F 3J35 = 7.462, P= 0.0001). In addition, we found no 
significant (a = 0.05) correlation between total number of 
larvae per plant with mass per larva using individual linear 
models for each plant line separately (supplementary fig. S3, 
Supplementary Material online). Post hoc tests showed that 
average masses of larvae reared on GKO plants were signifi- 
cantly larger than larvae reared on WT (fig. 3A) and AGKO 
plants (supplementary fig. S3, Supplementary Material online), 
3 days and 5 days post-egg hatching. Similarly, egg-to-adult 
development time was more rapid in S. flava reared on GKO 
plants than in WT plants. Cohorts of adult flies emerged from 
GKO plants a half day earlier than flies from WT plants 
(P< 0.05, two-sided f-test) (fig. 3B). 

Secondary plant compounds such as glucosinolates require 
amino acid and carbon precursors and are potentially costly to 
produce (Heil 2002). Although the GKO Arabidopsis plants are 
phenotypically nearly identical to WT to human observers, it is 
possible that the leaves of GKO plants, blocked from synthe- 
sizing glucosinolates, may retain a liberated pool of nutritive 
precursors such as sugars and amino acids. The observed in- 
crease in 5. flava growth rate in GKO relative to WT plants 
could be explained by a buildup of these plant primary me- 
tabolites. To test this possibility, we profiled primary metabo- 
lites in above ground tissues of 4-week-old plants of each 
genotype (GKO and WT, N= 5 plants each) by fractionation 
and gas chromatography-mass spectrometry (GC-MS). None 
of the 7,893 major ion features (e.g., sugars or amino acids) 
detected in this survey were significantly higher (P< 0.01) in 
GKO than WT plants (supplementary table S1 , Supplementary 
Material online). Similarly, only five compounds with best 
matches to the following sugars were enriched in WT plants 
relative to GKO: arabinofuranose, galactofuranose, turanose, 



1 ,6-anhydro-B-D-glucose, and B-DL-arabinopyranose (raw 
peak areas for each feature for all samples given in supple- 
mentary table S1, Supplementary Material online). 

The GKO mutant we used is also deficient in an indole 
alkaloid phytoalexin called camalexin (3-thiazol-2'-yl-indole) 
(Zhao and Last 1996). Camalexin is synthesized first from 
tryptophan, which is converted to indole acetaldoxime by 
CYP79B2 and CYP79B3 (Glawischnig et al. 2004). Because 
the GKO is null for cyp79b2 and cyp79b3, we cannot exclude 
the hypothesis that S. flava performance and transcriptional 
phenotypes tested on the GKO mutant are due in part to the 
absence of camalexins. Camalexins have antimicrobial prop- 
erties and could potentially have antiherbivore propteries 
(Rogers et al. 1996). 

Sequencing and Assembly of the 5. flava Transcriptome 

The results described earlier suggest that glucosinolates are an 
important defense against 5. flava and might have been a 
colonization barrier during or after the evolution of herbivory 
in the lineage. We reasoned that S. flava genes that respond 
transcriptionally to feeding on glucosinolate-bearing plant tis- 
sues were likely to be involved in the evolutionary transition to 
herbivory. We therefore used next-generation sequencing 
(RNA-seq) to analyze the transcriptional responses of 5. flava 
flies reared on WT or GKO plants. We chose first to sequence 
a cDNA library from an RNA pool derived from mixed life 
stages (eggs, larvae, pupae, and adults) of an iso-female 
colony of 5. flava using 454 Titanium FLX chemistry. We com- 
bined this assembly with reads from four lllumina GA II 
RNA-seq reads of 5-day-old 5. flava larvae reared on either 
WT or GKO plants that are described later, to maximize 
our coverage. This resulted in a total of 104,118 preliminary 
contigs with an N50 of 289 (supplementary fig. S7, 
Supplementary Material online). 

Before further analysis, we screened these 1 04, 1 1 8 contigs 
for transposable elements, simple sequence repeats, and 
low-complexity regions using RepeatMasker with the default 
options and a DrosopMa-specific library (RepBase: 
repeatmaskerlibraries-20090604.tar.gz). We removed from 
further consideration 8,819 contigs with either evidence for 
TE contamination or with greater than 1 0% of their sequence 
consisting of simple sequence repeats or low complexity re- 
gions, leaving us with a total of 95,299 contigs. These form 
the basis of all our subsequent analysis and are referred to as 
the 5. flava contig set. We then used coding sequences 
from D. grimshawi (the closest fully sequenced sister taxon 
to 5. flava) as scaffolds to assemble contigs using blastx 
(Crawford et al. 2010). By scaffolding contigs and merging 
potential 5. flava paralogs identified via all-against-all blastn 
searches, we collapsed the original 95,299 contigs to 36,813 
or 48,872 (depending on homology cutoffs; see supplemental 
materials, Supplementary Material online) assembled tran- 
scripts. These transcripts were then filtered to remove short 
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Fig. 3. — The absence of aliphatic and indolic glucosinolate biosynthesis pathways in Arabidopsis increases weight gain and decreases development time 
in S. flava. (A) Larvae reared from eclosion in wild-type (WT) (Col-0) or quadruple glucosinolate knockout (GKO) (cyp79b2cyp79b3myb28myb29) Arabidopsis 
lines that were harvested at 3 and 5 days posteclosion weighed significantly more when reared on the glucosinolate knockout. (B) S. flava reared from eggs in 
GKO Arabidopsis plants develop more rapidly than those reared on WT plants. 



contigs and those with weak or complex Blast hits, resulting 
in a total of 16,476 high-confidence transcript models, 
which served as the transcriptome against which the short 
reads from lllumina sequencing of the 5-day-old larvae from 
WT and GKO plants were mapped to quantify differential 
regulation of 5. flava genes. 

Transcriptional Analysis of the 5. flava Response to 
Glucosinolates 

To measure differential expression between the transcrip- 
tomes of flies raised on Arabidopsis plants with and without 
glucosinolates, we collected 5-day-old larvae from the perfor- 
mance study, which were reared in WT and GKO plants. Using 
two pools of RNA for each treatment, we conducted an 
RNA-seq experiment using lllumina 75 bp sequencing. We 
identified a total of 341 transcripts that were differentially 
expressed between 5-day-old 5. flava larvae reared on WT 
and GKO Arabidopsis (fig. 4). Of these 341 5. flava transcripts, 
278 (82%) were significantly upregulated (induced) and 63 
(18%) were significantly downregulated (repressed) in larvae 
reared on WT relative to GKO plants. Glucosinolates in the diet 
of 5. flava thus appear to induce many more transcripts than 
they repress. We found one-to-one D. melanogaster orthologs 
for 121 5. flava transcripts: 105 of 278 induced 5. flava tran- 
scripts and 16 of 63 repressed 5. flava transcripts. Gene 
ontology (GO) categories for 76 of these 121 differentially 
regulated 5. flava transcripts with homologous D. melanoga- 
ster genes could be inferred in AmiGO (Ashburner et al. 2000), 
and there was an enrichment of GO categories related to 
hemolymph coagulation (biological process), plasma mem- 
brane (cellular component), and structure constituent of the 
cuticle/chitin (molecular function) (table 2). 



Stress-Related Genes Are Regulated by Glucosinolates 
in S. flava 

We evaluated the data set of 121 differentially regulated 
S. flava transcripts with D. melanogaster homologs in two 
ways. First, we compared these transcripts with the set of 
D. melanogaster genes activated or repressed by exposure 
to various stressors, including aging (Landis et al. 2004), 
the dietary toxin phenobarbital (Misra et al. 2011), heat 
(Sorensen et al. 2005), oxidation (Landis et al. 2004), and 
starvation (Harbison et al. 2005). Two genes induced by 
at least one stressor in D. melanogaster are homologous 
to 5. flava transcripts induced by the presence of glucosino- 
lates in the diet (table 3). This analysis suggests that there is 
at least some overlap between genes modulated by 
known physiological stressors in D. melanogaster and by the 
presence/absence of glucosinolates in larval host plants of 
S. flava. 

The 5. flava performance and growth rate experiments 
show that flies reared in WT plants are developmental^ de- 
layed relative to those reared on GKO plants (fig. 3). Thus, 
developmentally regulated genes that do not respond to die- 
tary glucosinolates directly may confound gene expression dif- 
ferences between larvae reared on GKO or WT plants. To 
compensate for this potential confounding factor, we ex- 
cluded 5. flava transcripts from further analysis if developmen- 
tal expression differences in the D. melanogaster homolog 
between second and early third instar larvae (Graveley et al. 
201 1), corresponding to possible stages at which larvae were 
harvested, explained >50% of the log 2 fold expression differ- 
ence in 5. flava larvae reared on WT versus GKO plants. 
We found 1 1 transcripts induced by glucosinolate consump- 
tion and 10 transcripts repressed by glucosinolate consump- 
tion in 5. flava that passed this extremely conservative test 
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Fig. 4. — Scatterplot showing differentially regulated transcripts from 5. flava 5-day-old larvae raised on WT versus GKO plants. Transcripts that were 
significantly induced by glucosinolates in the diet of 5. flava are in red, and those repressed by glucosinolates in the diet are in purple. Two RNA pools 
(biological replicates) for each treatment were each sequenced on single lllumina lanes (four lanes total). These sequences were mapped back to an 
assembled transcriptome that was scaffolded on the D. grimshawi genome. A total of 341 transcripts were differentially regulated. Larvae used for this 
analysis were obtained from those used in the weight-gain analysis presented in figure 3. 



Table 2 

Selected Gene Ontology Enrichment for 76 of 121 Differentially Regulated Scaptomyza flava Transcripts with Homologs in Drosophila melanogaster 



GO Category 



GO Term 



P Value Sample Frequency 



Background 
Frequency 



Genes 



Biological process 


0050817, coagulation, 
metamorphosis 


2.17e-03 


3/76 


6/13,178 


Muc68Ca, Hml, fon 




0010171, body 


4.11e-03 


4/76 


22/13,178 


TwdlR, TwsdlS, TwdlW, TwdlBeta 




morphogenesis 










Cellular component 


0031226, intrinsic to 
plasma membrane 


1.95e-03 


9/76 


211/13,178 


sas, Osi19, Osi6, Os\7, Osi18, 
Osi15, Osi9, Os\2, Osi20 


Molecular function 


0042302, structural 
constituent of cuticle 


3.89e-07 


11/76 


146/13,178 


Cpr72Ec, CG7548, TwdlR, TwdlS, 
CG8543, TwIdW, CG8927, CG8541, 
Cpr64Ab, dyl, TwIdBeta 



(table 3); for these 21 transcripts, developmental differences 
between larvae are unlikely to explain the transcriptional dif- 
ferences we observed. Of the 1 1 induced loci, two were 
among the four found to be induced by other stressors in D. 
melanogaster (table 3), and of the 1 0 repressed loci, five were 
found to be repressed by other stressors in D. melanogaster. 
This overlap suggests that the observed correlation between 
the transcriptional response to stress and the transcriptional 



response to dietary glucosinolates is not an artifact of devel- 
opmental differences between larvae in the two samples. 

Metabolic, tissue remodeling, and wound-associated genes 
were among the general classes of the 21 genes affected by 
glucosinolate uptake in 5. flava (table 3). Thus, a set of 
stress-related genes appear to be induced by glucosinolates 
in the diet of even specialized insects that have adapted to 
grow in the presence of glucosinolates. 
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Evolutionary History of Genes Differentially 
Regulated by Glucosinolates 

The assembled transcriptome allowed us to study the evolu- 
tionary history of glucosinolate-regulated 5. flava transcripts 
for which we could find homologs in the other Drosophila 
species. We tested whether these transcripts exhibit more 
rapid rates of evolution than the bulk of the transcriptome, 
as found for immunity and stress-related genes in D. melano- 
gaster (Sackton et al. 2007). We used alignments of all single 
copy orthologs in the S. flava transcriptome and D. grimshawi, 
D. virilis, and D. mojavensis genomes, to conduct a fore- 
ground-background analysis in PAML (Yang 1998, 2007) to 
estimate co (c/ N /c/ s , the rate of nonsynonymous substitutions 
per nonsynonymous site/synonymous substitutions per syn- 
onymous site). Using our drosophilid species tree (fig. 1), we 
examined co on the branch leading to S. flava (foreground) and 
across the rest of the subgenus Drosophila (background). With 
only three evolutionarily proximate taxa in the subgenus 
Drosophila, we lacked sufficient power to test for positive se- 
lection (a subset of codons with c/ N /c/ s >1) in the branch lead- 
ing to 5. flava. 

Transcripts that were more abundant in flies that fed on 
WT plants (those with glucosinolates; A/= 62 transcripts) had a 
significantly higher co value in the 5. flava lineage than tran- 
scripts whose expression was unaffected by glucosinolates 
(A/ = 4,711) (P=0.036, Mann-Whitney U test) (fig. 5/4). 
Interestingly, this pattern is not specific to 5. flava, as the in- 
duced transcripts also have significantly elevated co values in 
the background lineages included from the subgenus 
Drosophila (D. grimshawi, D. mojavensis, and D. virilis) com- 
pared with the unregulated transcripts (P= 0.043, Mann- 
Whitney U test). Accelerated evolution of genes induced by 
glucosinolates in 5. flava is consistent with other studies show- 
ing that genes involved in stress responses tend to evolve rap- 
idly (Li et al. 2003, 2004; Clark et al. 2007; Low et al. 2007; 
Sackton et al. 2007; Thomas 2007; Matzkin 2008). 

Despite a concerted search for conserved domains on 
GenBank, we were able to identify D. melanogaster homologs 
of only 36% of genes (121 of 341) that were differentially 
regulated by glucosinolates in the diet of 5. flava. More gen- 
erally, a substantial number (5,967) of the 16,476 transcripts 
we identified in 5. flava have no identifiable homologs in other 
species (supplementary table S6, Supplementary Material 
online). To provide reliable estimates of the extent of 
5. f/ava-specific transcripts, we therefore focused on an anal- 
ysis of unassembled contigs, rather than scaffolds, because by 
definition scaffolds can only be assembled for transcripts with 
a D. grimshawi homolog. We used conservative criteria to 
define 5. f/ava-specific contigs, requiring them to be at least 
200 bp in length and to fail to produce a significant 
(£< 1e— 10) blast hit against both the NCBI nonredundant 
protein database (nr) and DrosopMa-specific databases. 
Using these criteria, 6,036 of 79,947 contigs were 5. flava 



specific (7.55%). We then compared the proportion of con- 
tigs in the induced, repressed, and nonregulated expression 
categories that have no known homologs outside of 5. flava 
(fig. 5B). The set of contigs that were induced or repressed in 
5-day-old larvae reared on WT versus GKO plants had more 
candidate 5. f/a\/a-specif ic contigs than those not regulated by 
the presence of glucosinolates (x 2 = 12.8346, df=2, 
P<0.01). This suggests that adaptation to glucosinolates in 
S. flava potentially involved novel genes. A preliminary blast 
against newly available genomic sequencing reads from 
5. flava confirms that they are of fly origin (see supplementary 
materials and methods, Supplementary Material online). A 
total of 99.3% (149/150) of the 5. f/ava-specific transcripts 
that were differentially regulated by glucosinolate consump- 
tion were represented in the preliminary genome assembly 
(blastn, E-value cutoff: 1e— 10). 

Discussion 

Herbivory (leaf mining) in the drosophilid fly genus 
Scaptomyza likely evolved relatively recently (6-16 MYBP) 
compared to the evolution of herbivory in the dominant 
orders of herbivorous insects. Because of this relatively 
recent acquisition of a phytophagous life history and the 
many genomic tools available for drosophilids and 
Arabidopsis, 5. flava-Arabidopsis interactions represent an at- 
tractive and promising model system to study the evolution- 
ary, ecological, and molecular aspects of genome adaptations 
that occur during the transition to herbivory in insects. 

We primarily focused on the interactions between 5. flava 
and glucosinolates, the major secondary compounds in their 
most typical host plants. Arabidopsis plants activate the indolic 
glucosinolate biosynthetic pathway in response to feeding by 
S. flava larvae, and 5. flava, which is mostly restricted in host 
range to mustard plants (Martin 2004), is at least partially 
susceptible to glucosinolates synthesized by Arabidopsis as 
evidenced by reduced growth rate on WT versus GKO 
plants. Interestingly, 5. flava larvae reared on WT plants 
were not significantly different in average mass from those 
reared on AGKO plants. AGKO plants still have indolic gluco- 
sinolates, and this suggests an important role for indolic glu- 
cosinolates or indole-derived camalexins in mediating 
quantitative resistance against 5. flava. Rapid development 
time may be an advantage in leaf-mining species, which are 
heavily attacked by parasitoid wasps throughout their devel- 
opment (Connor and Taverner 1997; Whiteman et al. 201 1). 
Because 5. flava feeds mostly, but not exclusively, on plants 
in the order Brassicales, the finding of reduced performance 
of 5. flava on WT versus GKO plants is not surprising. 
Oligophagous herbivores with broader host ranges, such as 
5. flava, are thought to be less efficient at coping with plant 
secondary compounds than are specialist species that feed on 
a more restricted range of hosts (Li et al. 2004). Future studies 
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dietary glucosinolates in 5. flava evolve more rapidly than genes unaffected in expression by glucosinolates. Median d^/d s values are significantly higher in loci 
induced by dietary glucosinolates in 5. flava compared with loci not induced by glucosinolates. (B) Results of comparison of the proportion of 5. flava contigs 
lacking identifiable homologs in each expression class. The difference among classes is significant by a x 2 test (x 2 = 1 2.8346, df = 2,P= 0.001 63). Candidate 
5. f/ava-specific transcripts are significantly more abundant among transcripts induced or repressed by glucosinolates in the diet of 5. flava versus those that 
are uninduced by glucosinolates in the diet of 5. flava. 



912 Genome Biol. Evol. 4(9):900-916. doi:10.1093/gbe/evs063 Advance Access publication July 19, 2012 



Genes Involved in the Evolution of Herbivory 



GBE 



should dissect the independent role of camalexins and indolic 
glucosinolates in medating resistance against 5. flava. 

The assembly of an 5. flava transcriptome allowed us to 
identify 341 S. flava genes that were differentially regulated by 
glucosinolates, most of which were induced rather than re- 
pressed. Only a limited number of transcriptional profiling 
studies have been carried out on herbivorous insects, including 
those in which larvae feed on plants with disabled defense 
pathways (Govind et al. 201 0; Alon et al. 201 1 ) or on different 
plant species with alternative toxins (Grbicetal. 2011). In each 
of these cases, hundreds of genes are differentially regulated 
by the presence of plant secondary compounds, a pattern 
similar to the one found here. However, in the case of 
5. flava, the benefit of being able to compare the transcrip- 
tome to the exceptionally well curated and annotated D. mel- 
anogaster genome greatly facilitated assignment of putative 
functions for differentially expressed genes, as well as the dis- 
covery of newly evolved loci. 

The most prominent functional category identified was for 
stress-related genes. The fact that stress-related genes are 
regulated by glucosinolates in 5. flava is intriguing because 
most specialist (monophagous) and oligophagous herbivores 
are thought to have evolved efficient and highly specialized 
detoxification systems to minimize physiological impacts of 
glucosinolate consumption. For example, Pieris spp. butterfly 
larvae do not encounter toxic glucosinolate breakdown prod- 
ucts, such as isothiocyanates, and instead have evolved ways 
of preventing their formation (Wittstock et al. 2004). 

We found that the 5. flava homolog of the central D. mel- 
anogaster translational regulator Thor is particularly interesting 
because it is induced by bacterial infection, phenobarbital, 
oxidation stress, and starvation in D. melanogaster (Bernal 
and Kimbrell 2000; Landis et al. 2004; Harbison et al. 2005; 
Misra et al. 2011). Thor also serves as a "metabolic brake" 
during stressful conditions in D. melanogaster (Teleman et al. 
2005). A homolog of Thor, 4E-BP1, is induced by phenethyl 
isothiocyanate in human cancer cell lines suggesting that there 
is conservation in this response to isothiocyanates across meta- 
zoans (Hu et al. 2007). Other differentially regulated 5. flava 
genes with D. melanogaster homologs include the Tweedle 
genes, a large group of genes involved in formation of the 
insect cuticle (Cornman 2009). Isothiocyanates could poten- 
tially interact with the cuticle (and chitin) directly (McCormick 
et al. 1 980), resulting in regulation of genes involved in cuticle 
and peritrophic membrane formation (Whiteman et al. 201 1) 
and maintenance due to the presence of isothiocyanates in 
leaves that are encountered by endophagous feeders such as 
5. flava (van Ommen Kloeke et al. 2012). Genes involved in 
blood coagulation were also among those found to be differ- 
entially regulated by isothiocyantes and this finding is consis- 
tent with effects of dietary isothiocyanates on animals from 
other taxa. For example, dietary isothiocyanates in rats in- 
crease the rate of blood coagulation (Idris and Ahmad 1975; 
Shan et al. 2010). 



Furthermore, we found that transcripts induced or re- 
pressed in larvae feeding on WT versus GKO plants are 
less evolutionarily conserved than transcripts that were not 
differentially regulated. We also uncovered a greater propor- 
tion of transcripts with no known homologs in the differen- 
tially regulated set, which could represent novel genes in this 
lineage or in the subgenus Drosophila. Taken together, our 
data indicate that the adaptation to mustard oil-bearing 
plants, and potentially herbivory in the lineage Scaptomyza, 
involved elaboration of pre-existing loci, potential exaptation 
of existing, rapidly evolving, stress-related genes, and the 
emergence of entirely new genes. 

The functional basis of adaptation to glucosinolates in the 
diets of herbivores has been investigated in great detail in 
several lineages of insect, including the lepidopterans Pieris 
rapae (Wheat et al. 2007) and Plutella xylostella (Ratzka 
et al. 2002), the orthopteran Schistocerca gregaria (Falk and 
Gershenzon 2007), the sawfly Athalia rosae (Opitz et al. 
2011), and the aphid Brevicoryne brassicae (Jones et al. 
2001 ). In each of these species, single enzymes were identified 
as essential for detoxification or sequestration of glucosinolate 
breakdown products that appear to allow for minimal contact 
with isothiocyanates (Muller et al. 201 0). In contrast, the iden- 
tities of the genes actually involved in detoxifying glucosino- 
lates in S. flava remain unknown. Our results suggest that 
5. flava may be directly encountering isothiocyanates given 
the significant growth rate reduction when cultured in WT 
versus GKO plants and the fact that hundreds of transcripts 
are differentially regulated by the presence or absence of glu- 
cosinolates in the diet of 5. flava. Hundreds of genes are also 
differentially regulated by isothiocyanates in experiments 
where humans and rats are given glucosinolates or isothiocy- 
anates (Hu et al. 2004; Bhamre et al. 2009). It is unknown 
whether other insects relatively specialized on Brassicales 
plants exhibit similar patterns to 5. flava or whether their de- 
toxification mechanisms are so efficient that gene expression 
differences between those that feed on plants with and with- 
out glucosinolates are minimal (Muller et al. 2010). 

Transcriptomic approaches in other insect taxa such as 
D. melanogaster would likely yield valuable insights into 
common pathways influenced by glucosinolates in insects 
and how these molecules affect insect physiology more gen- 
erally (Li et al. 2009). We did not detect transcriptional differ- 
ences in 5. flava homologs of most loci involved in induction of 
the Phase I or Phase II enzyme systems, which are candidate 
mustard oil detoxification proteins in humans and insects 
(Wadleigh and Yu 1988; Hu et al. 2004; Bhamre et al. 
2009). This may be due to examination of one time point 
corresponding to 5 days posteclosion. For example, elevated 
levels of transcripts of some of these genes, such as glutathi- 
one 5-transferases, which are involved in glucosinolate detox- 
ification in some insects and in humans (Wadleigh and Yu 
1988), are unlikely to remain elevated for such a long period 
of time (Tang and Tu 1995). Post-transcriptional regulation 
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could explain increased GST levels in some insects treated with 
toxins rather than increases in mRNA levels perse (Tang and 
Tu 1 995). Alternatively, structural changes could alter catalyitc 
efficiencies of genes encoding GSTs. 

In addition to providing insight into the ecological pressures 
faced by Arabidopsis in nature, analysis of a leaf-mining par- 
asite of Arabidopsis that is readily cultured in the laboratory 
and amenable to emerging genomic technologies may also 
have utility in agricultural research. Herbivorous insects cause 
major damage to crop plants around the world and are be- 
coming increasingly resistant to pesticides. Understanding 
molecular mechanisms used in overcoming natural insecticidal 
toxins such as glucosinolates may provide direct or indirect 
routes to novel insecticides or breeding strategies. Because 
5. flava attacks and completes development in all 
Arabidopsis ecotypes (accessions) and mutants that have 
been screened (Whiteman et al. 2011), this system provides 
a platform for dissecting the genetic bases of interactions be- 
tween plants and chewing herbivores. 

Supplementary Material 

Supplementary materials, figures S1-S7, and tables S1-S7 are 
available at Genome Biology and Evolution online (http:// 
www.gbe.oxfordjournals.org/). 
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